Cross-Language Domain Adaptation for Classifying Crisis-Related Short Messages

نویسندگان

  • Muhammad Imran
  • Prasenjit Mitra
  • Jaideep Srivastava
چکیده

Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can we choose labeled tweets for training? Specifically, we study the usefulness of labeled data of past events. Do labeled tweets in different language help? We observe the performance of our classifiers trained using different combinations of training sets obtained from past disasters. We perform extensive experimentation on real crisis datasets and show that the past labels are useful when both source and target events are of the same type (e.g. both earthquakes). For similar languages (e.g., Italian and Spanish), cross-language domain adaptation was useful, however, when for different languages (e.g., Italian and English), the performance decreased.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cross-Language Domain Adaptation

Rapid crisis response requires real-time analysis of messages. After a disaster happens, volunteers attempt to classify tweets to determine needs, e.g., supplies, infrastructure damage, etc. Given labeled data, supervised machine learning can help classify these messages. Scarcity of labeled data causes poor performance in machine training. Can we reuse old tweets to train classifiers? How can ...

متن کامل

Twitter Translation using Translation-Based Cross-Lingual Retrieval

Microblogging services such as Twitter have become popular media for real-time usercreated news reporting. Such communication often happens in parallel in different languages, e.g., microblog posts related to the same events of the Arab spring were written in Arabic and in English. The goal of this paper is to exploit this parallelism in order to eliminate the main bottleneck in automatic Twitt...

متن کامل

Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages

Microblogging platforms such as Twitter provide active communication channels during mass convergence and emergency events such as earthquakes, typhoons. During the sudden onset of a crisis situation, affected people post useful information on Twitter that can be used for situational awareness and other humanitarian disaster response efforts, if processed timely and effectively. Processing soci...

متن کامل

#Unconfirmed: Classifying Rumor Stance in Crisis-Related Social Media Messages

It is well-established that within crisis-related communications, rumors are likely to emerge. False rumors, i.e. misinformation, can be detrimental to crisis communication and response; it is therefore important not only to be able to identify messages that propagate rumors, but also corrections or denials of rumor content. In this work, we explore the task of automatically classifying rumor s...

متن کامل

Tweet4act: Using incident-specific profiles for classifying crisis-related messages

We present Tweet4act, a system to detect and classify crisis-related messages communicated over a microblogging platform. Our system relies on extracting content features from each message. These features and the use of an incident-specific dictionary allow us to determine the period type of an incident that each message belongs to. The period types are: pre-incident (messages talking about pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1602.05388  شماره 

صفحات  -

تاریخ انتشار 2016